强度衍射断层扫描(IDT)是指用于从一组仅2D强度测量的样品成像样品的3D折射率(RI)分布的一类光学显微镜技术。由于相位信息的丢失和缺失的锥体问题,无伪影RI地图的重建是IDT的一个基本挑战。神经领域(NF)最近成为一种新的深度学习方法(DL),用于学习物理领域的连续表示。 NF使用基于坐标的神经网络来表示该场,通过将空间坐标映射到相应的物理量,在我们的情况下,复杂价值的折射率值。我们将DEPAF作为第一种基于NF的IDT方法,可以从仅强度和有限角度的测量值中学习RI体积的高质量连续表示。 DECAF中的表示形式是通过使用IDT向前模型直接从测试样品的测量值中学到的,而无需任何地面真相图。我们对模拟和实验生物学样品进行定性和定量评估DECAF。我们的结果表明,DECAF可以生成高对比度和无伪影RI图,并导致MSE超过现有方法的2.1倍。
translated by 谷歌翻译
人工智能(AI)为简化Covid-19诊断提供了有前景的替代。然而,涉及周围的安全和可信度的担忧阻碍了大规模代表性的医学数据,对临床实践中训练广泛的模型造成了相当大的挑战。为了解决这个问题,我们启动了统一的CT-Covid AI诊断计划(UCADI),其中AI模型可以在没有数据共享的联合学习框架(FL)下在每个主机机构下分发和独立地在没有数据共享的情况下在每个主机机构上执行。在这里,我们认为我们的FL模型通过大的产量(中国测试敏感性/特异性:0.973 / 0.951,英国:0.730 / 0.942),与专业放射科医师的面板实现可比性表现。我们进一步评估了持有的模型(从另外两家医院收集,留出FL)和异构(用造影材料获取)数据,提供了模型所做的决策的视觉解释,并分析了模型之间的权衡联邦培训过程中的性能和沟通成本。我们的研究基于来自位于中国和英国的23家医院的3,336名患者的9,573次胸部计算断层扫描扫描(CTS)。统称,我们的工作提出了利用联邦学习的潜在保留了数字健康的前景。
translated by 谷歌翻译
跨图像建立视觉对应是一项具有挑战性且必不可少的任务。最近,已经提出了大量的自我监督方法,以更好地学习视觉对应的表示。但是,我们发现这些方法通常无法利用语义信息,并且在低级功能的匹配方面过度融合。相反,人类的视觉能够将不同的物体区分为跟踪的借口。受此范式的启发,我们建议学习语义意识的细粒对应关系。首先,我们证明语义对应是通过一组丰富的图像级别自我监督方法隐式获得的。我们进一步设计了一个像素级的自我监督学习目标,该目标专门针对细粒的对应关系。对于下游任务,我们将这两种互补的对应表示形式融合在一起,表明它们是协同增强性能的。我们的方法超过了先前的最先进的自我监督方法,使用卷积网络在各种视觉通信任务上,包括视频对象分割,人姿势跟踪和人类部分跟踪。
translated by 谷歌翻译
对比度学习的许多最新方法已努力弥补在ImageNet等标志性图像和Coco等复杂场景上进行预处理的预处理之间的差距。这一差距之所以存在很大程度上是因为普遍使用的随机作物增强量在不同物体的拥挤场景图像中获得语义上不一致的内容。以前的作品使用预处理管道来定位明显的对象以改进裁剪,但是端到端的解决方案仍然难以捉摸。在这项工作中,我们提出了一个框架,该框架通过共同学习表示和细分来实现这一目标。我们利用分割掩码来训练具有掩模依赖性对比损失的模型,并使用经过部分训练的模型来引导更好的掩模。通过在这两个组件之间进行迭代,我们将分割信息中的对比度更新进行基础,并同时改善整个训练的分割。实验表明我们的表示形式在分类,检测和分割方面鲁棒性转移到下游任务。
translated by 谷歌翻译
对象检测在清洁数据集上取得了有希望的性能,但仍然探讨了如何在对抗性鲁棒性和清洁精度之间实现更好的权衡。对抗性培训是提高稳健性的主流方法,但大多数作品将牺牲清洁精度,以获得比标准训练的坚固性。在本文中,我们提出了统一的解耦特征对准(UDFA),一种新型微调范例,通过完全探索对象检测的自我知识蒸馏和对抗训练之间的组合来实现比现有方法更好的性能。我们首先使用分离的前/后地特征来构建自我知识蒸馏分支,从预磨牙探测器(作为教师)和来自学生探测器的对抗特征表示之间的清洁特征表示之间。然后我们通过将原始分支解耦为自我监督的学习分支和新的自我知识蒸馏分支来探索自我知识蒸馏。通过对Pascal-VOC和MS-Coco基准测试的广泛实验,评估结果表明,UDFA可以超越标准培训和最先进的对抗对象培训方法进行对象检测。例如,与教师探测器相比,我们在GFLV2与RESET-50的方法通过Pascal-Voc上的2.2 AP提高了干净精度;与SOTA对抗性培训方法相比,我们的方法通过1​​.6 AP改善了干净的精度,同时通过0.5 AP改善对抗性鲁棒性。我们的代码将在https://github.com/grispeut/udfa提供。
translated by 谷歌翻译
深度多视图聚类方法取得了显着的性能。然而,所有这些都未能考虑在多视图样本上的难度标签(训练样本的地面真理的不确定性),这可能导致非群体聚类网络在训练过程中陷入糟糕的本地Optima;更糟糕的是,多视图样本的难度标签始终不一致,但事实使其更具挑战性。在本文中,我们提出了一种新的深对抗性不一致的认知采样(DACE)方法,用于多视图逐行子空间聚类。提出了多视图二进制分类(简单或困难)丢失和特征相似性损失,共同学习二进制分类器和深度一致的特征嵌入网络,在多维型一致样本的难度标签上过度的对手Minimax游戏。我们开发了一种多视图认知采样策略,可从易于困难的多视图聚类网络训练中选择输入样本。然而,容易和难以样品的分布混合在一起,因此实现目标并不差。要解决它,我们可以定义具有理论保证的采样概率。基于此,一种金段机制进一步设计用于生成样本集边界,以通过栅极单元逐渐选择具有变化难度标签的样本,该门单元用于共同学习多视图常见渐进子空间和聚类网络以进行更高效聚类。四个现实世界数据集的实验结果证明了守护处的优越性。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译